AITopics | self-supervised visual representation learning

Self-Supervised Visual Representation Learning from Hierarchical Grouping

Neural Information Processing SystemsDec-24-2025, 13:27:08 GMT

We create a framework for bootstrapping visual representation learning from a primitive visual grouping capability. We operationalize grouping via a contour detector that partitions an image into regions, followed by merging of those regions into a tree hierarchy.

hierarchical grouping, name change, self-supervised visual representation learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Self-Supervised Visual Representation Learning with Semantic Grouping

Neural Information Processing SystemsDec-24-2025, 09:32:35 GMT

In this paper, we tackle the problem of learning visual representations from unlabeled scene-centric data. Existing works have demonstrated the potential of utilizing the underlying complex structure within scene-centric data; still, they commonly rely on hand-crafted objectness priors or specialized pretext tasks to build a learning framework, which may harm generalizability. Instead, we propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning. The semantic grouping is performed by assigning pixels to a set of learnable prototypes, which can adapt to each sample by attentive pooling over the feature and form new slots. Based on the learned data-dependent slots, a contrastive objective is employed for representation learning, which enhances the discriminability of features, and conversely facilitates grouping semantically coherent pixels together. Compared with previous efforts, by simultaneously optimizing the two coupled objectives of semantic grouping and contrastive learning, our approach bypasses the disadvantages of hand-crafted priors and is able to learn object/group-level representations from scene-centric images. Experiments show our approach effectively decomposes complex scenes into semantic groups for feature learning and significantly benefits downstream tasks, including object detection, instance segmentation, and semantic segmentation.

antic group, name change, self-supervised visual representation learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Neural Information Processing SystemsDec-23-2025, 21:03:26 GMT

Studies on self-supervised visual representation learning (SSL) improve encoder backbones to discriminate training samples without labels. While CNN encoders via SSL achieve comparable recognition performance to those via supervised learning, their network attention is under-explored for further improvement. Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL. The proposed CARE framework consists of a CNN stream (C-stream) and a transformer stream (T-stream), where each stream contains two branches. C-stream follows an existing SSL framework with two CNN encoders, two projectors, and a predictor.

cnn encoder, revitalizing cnn attention, self-supervised visual representation learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.56)
Information Technology > Artificial Intelligence > Vision (0.36)

Add feedback

Review for NeurIPS paper: Self-Supervised Visual Representation Learning from Hierarchical Grouping

Neural Information Processing SystemsMay-31-2025, 17:13:54 GMT

Summary and Contributions: Post rebuttal update begins I thank the authors for addressing some of my concerns. I, however, disagree with several of the arguments put forward in the rebuttal. I have nevertheless updated my overall score as the authors provided/promised some of the requested experiments. I detail my concerns below: "aim of Self-supervised learning is to create universal visual representations": A substantial part of the community, however, is still interested in transferable representations. This pursuit is valuable in its own way as it tries to generalize to very novel settings with very limited data.

hierarchical grouping, representation, self-supervised visual representation learning, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science (0.62)
Information Technology > Artificial Intelligence > Machine Learning (0.38)

Add feedback

Self-Supervised Visual Representation Learning with Semantic Grouping

Neural Information Processing SystemsOct-11-2024, 10:56:39 GMT

In this paper, we tackle the problem of learning visual representations from unlabeled scene-centric data. Existing works have demonstrated the potential of utilizing the underlying complex structure within scene-centric data; still, they commonly rely on hand-crafted objectness priors or specialized pretext tasks to build a learning framework, which may harm generalizability. Instead, we propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning. The semantic grouping is performed by assigning pixels to a set of learnable prototypes, which can adapt to each sample by attentive pooling over the feature and form new slots. Based on the learned data-dependent slots, a contrastive objective is employed for representation learning, which enhances the discriminability of features, and conversely facilitates grouping semantically coherent pixels together.

antic group, contrastive learning, self-supervised visual representation learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Self-Supervised Visual Representation Learning from Hierarchical Grouping

Neural Information Processing SystemsOct-11-2024, 06:22:31 GMT

We create a framework for bootstrapping visual representation learning from a primitive visual grouping capability. We operationalize grouping via a contour detector that partitions an image into regions, followed by merging of those regions into a tree hierarchy. Across a large unlabeled dataset, we apply this learned primitive to automatically predict hierarchical region structure. These predictions serve as guidance for self-supervised contrastive feature learning: we task a deep network with producing per-pixel embeddings whose pairwise distances respect the region hierarchy. Experiments demonstrate that our approach can serve as state-of-the-art generic pre-training, benefiting downstream tasks.

hierarchical grouping, hierarchy, self-supervised visual representation learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Neural Information Processing SystemsOct-9-2024, 18:31:29 GMT

Studies on self-supervised visual representation learning (SSL) improve encoder backbones to discriminate training samples without labels. While CNN encoders via SSL achieve comparable recognition performance to those via supervised learning, their network attention is under-explored for further improvement. Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL. The proposed CARE framework consists of a CNN stream (C-stream) and a transformer stream (T-stream), where each stream contains two branches. C-stream follows an existing SSL framework with two CNN encoders, two projectors, and a predictor.

cnn encoder, revitalizing cnn attention, self-supervised visual representation learning, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.59)
Information Technology > Artificial Intelligence > Vision (0.39)

Add feedback

Collaborating Authors

self-supervised visual representation learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Self-Supervised Visual Representation Learning from Hierarchical Grouping

Self-Supervised Visual Representation Learning with Semantic Grouping

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Review for NeurIPS paper: Self-Supervised Visual Representation Learning from Hierarchical Grouping

Self-Supervised Visual Representation Learning with Semantic Grouping

Self-Supervised Visual Representation Learning from Hierarchical Grouping

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning